A lexicon for Vietnamese language processing
Identifieur interne : 004D23 ( Main/Exploration ); précédent : 004D22; suivant : 004D24A lexicon for Vietnamese language processing
Auteurs : Th Minh Huy Nguy [Viêt Nam] ; Laurent Romary [France] ; Mathias Rossignol [Viêt Nam] ; Xuân L Ng V [Viêt Nam]Source :
- Language Resources and Evaluation [ 1574-020X ] ; 2006-12-01.
English descriptors
- KwdEn :
Abstract
Abstract: Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as part-of-speech tagging, parsing, etc., are very difficult tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by each research team is a real obstacle to the development of Vietnamese language processing. The aim of our projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP applications. We emphasize the standardization aspect of the lexicon representation. We especially propose an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management).
Url:
DOI: 10.1007/s10579-007-9034-8
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001662
- to stream Istex, to step Curation: 001643
- to stream Istex, to step Checkpoint: 001108
- to stream Main, to step Merge: 004E57
- to stream Main, to step Curation: 004D23
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A lexicon for Vietnamese language processing</title>
<author><name sortKey="Nguy, Th Minh Huy" sort="Nguy, Th Minh Huy" uniqKey="Nguy T" first="Th Minh Huy" last="Nguy">Th Minh Huy Nguy</name>
</author>
<author><name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
</author>
<author><name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
</author>
<author><name sortKey="V, Xuan L Ng" sort="V, Xuan L Ng" uniqKey="V X" first="Xuân L Ng" last="V">Xuân L Ng V</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:6125B8F3F38C1AB2E4070555916396C65C331950</idno>
<date when="2007" year="2007">2007</date>
<idno type="doi">10.1007/s10579-007-9034-8</idno>
<idno type="url">https://api.istex.fr/ark:/67375/VQC-G67GFMPF-9/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001662</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001662</idno>
<idno type="wicri:Area/Istex/Curation">001643</idno>
<idno type="wicri:Area/Istex/Checkpoint">001108</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">001108</idno>
<idno type="wicri:doubleKey">1574-020X:2007:Nguy T:a:lexicon:for</idno>
<idno type="wicri:Area/Main/Merge">004E57</idno>
<idno type="wicri:Area/Main/Curation">004D23</idno>
<idno type="wicri:Area/Main/Exploration">004D23</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A lexicon for Vietnamese language processing</title>
<author><name sortKey="Nguy, Th Minh Huy" sort="Nguy, Th Minh Huy" uniqKey="Nguy T" first="Th Minh Huy" last="Nguy">Th Minh Huy Nguy</name>
<affiliation wicri:level="1"><country xml:lang="fr">Viêt Nam</country>
<wicri:regionArea>Faculty of Mathematics, Mechanics and Informatics, Hanoi University of Science, 334 Nguyen Trai, 10000, Hanoi</wicri:regionArea>
<wicri:noRegion>Hanoi</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Viêt Nam</country>
</affiliation>
</author>
<author><name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>LORIA, Nancy</wicri:regionArea>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
<affiliation wicri:level="1"><country xml:lang="fr">Viêt Nam</country>
<wicri:regionArea>International Research Center MICA, Hanoi</wicri:regionArea>
<wicri:noRegion>Hanoi</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Viêt Nam</country>
</affiliation>
</author>
<author><name sortKey="V, Xuan L Ng" sort="V, Xuan L Ng" uniqKey="V X" first="Xuân L Ng" last="V">Xuân L Ng V</name>
<affiliation wicri:level="1"><country xml:lang="fr">Viêt Nam</country>
<wicri:regionArea>Vietnam Lexicography Center, Hanoi</wicri:regionArea>
<wicri:noRegion>Hanoi</wicri:noRegion>
</affiliation>
<affiliation></affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Language Resources and Evaluation</title>
<title level="j" type="abbrev">Lang Resources & Evaluation</title>
<idno type="ISSN">1574-020X</idno>
<idno type="eISSN">1572-0218</idno>
<imprint><publisher>Springer Netherlands</publisher>
<pubPlace>Dordrecht</pubPlace>
<date type="published" when="2006-12-01">2006-12-01</date>
<biblScope unit="volume">40</biblScope>
<biblScope unit="issue">3-4</biblScope>
<biblScope unit="page" from="291">291</biblScope>
<biblScope unit="page" to="309">309</biblScope>
</imprint>
<idno type="ISSN">1574-020X</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1574-020X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Lexicon</term>
<term>Linguistic resources</term>
<term>Part-of-speech</term>
<term>Standardization</term>
<term>Syntactic description</term>
<term>Vietnamese</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as part-of-speech tagging, parsing, etc., are very difficult tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by each research team is a real obstacle to the development of Vietnamese language processing. The aim of our projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP applications. We emphasize the standardization aspect of the lexicon representation. We especially propose an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management).</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
<li>Viêt Nam</li>
</country>
<region><li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement><li>Nancy</li>
</settlement>
</list>
<tree><country name="Viêt Nam"><noRegion><name sortKey="Nguy, Th Minh Huy" sort="Nguy, Th Minh Huy" uniqKey="Nguy T" first="Th Minh Huy" last="Nguy">Th Minh Huy Nguy</name>
</noRegion>
<name sortKey="Nguy, Th Minh Huy" sort="Nguy, Th Minh Huy" uniqKey="Nguy T" first="Th Minh Huy" last="Nguy">Th Minh Huy Nguy</name>
<name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
<name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
<name sortKey="V, Xuan L Ng" sort="V, Xuan L Ng" uniqKey="V X" first="Xuân L Ng" last="V">Xuân L Ng V</name>
</country>
<country name="France"><region name="Grand Est"><name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
</region>
<name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 004D23 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 004D23 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:6125B8F3F38C1AB2E4070555916396C65C331950 |texte= A lexicon for Vietnamese language processing }}
This area was generated with Dilib version V0.6.33. |